Loading data from a file

Table of contents

Problem

You want to load data from a file.

Solution

Delimited text files

The simplest way to import data is to save it as a text file with delimiters such as tabs or commas (CSV).

data <- read.csv("datafile.csv")

# Load a CSV file that doesn't have headers
data <- read.csv("datafile-noheader.csv", header=FALSE)

The function read.table() is a more general function which allows you to set the delimiter, whether or not there are headers, whether strings are set off with quotes, and more.

data <- read.table("datafile-noheader.csv",
                   header=FALSE,
                   sep= ","         # use "\t" for tab-delimited files
                   )

Data can also be loaded from a URL. These (very long) URLs point to the attachments on this page.

data <- read.csv("http://wiki.stdout.org/rcookbook/Data%20input%20and%20output/Loading%20data%20from%20a%20file/datafile.csv")

# Read in a CSV file without headers
data <- read.csv("http://wiki.stdout.org/rcookbook/Data%20input%20and%20output/Loading%20data%20from%20a%20file/datafile-noheader.csv", header=FALSE)

# Manually assign the header names
names(data) <- c("subject","sex","control","cond1","cond2")

The data files used above:

datafile.csv:

"subject","sex","control","cond1","cond2"
1,"M",7.9,12.3,10.7
2,"F",6.3,10.6,11.1
3,"F",9.5,13.1,13.8
4,"M",11.5,13.4,12.9

datafile-noheader.csv:

1,"M",7.9,12.3,10.7
2,"F",6.3,10.6,11.1
3,"F",9.5,13.1,13.8
4,"M",11.5,13.4,12.9

Fixed-width text files

Suppose your data has fixed-width columns, like this:

subject    sex control cond1 cond2
      1      M     7.9  12.3  10.7
      2      F     6.3  10.6  11.1
      3      F     9.5  13.1  13.8
      4      M    11.5  13.4  12.9

One way to read it in is to simply use read.table() as above, which will treat consecutive spaces as a single delimiter.

read.table("clipboard", header=TRUE, strip.white=TRUE)

However, your data file may have columns containing spaces, or columns with no spaces separating them, like this, where the scores column represents six different measurements, each from 0 to 3.

subject  sex  scores
   N  1    M  113311
   NE 2    F  112231
   S  3    F  111221
   W  4    M  011002

In this case, you may need to use the read.fwf() function. If you read the column names from the file, it requires that they be separated with a delimiter like a single tab, space, or comma. If they are separated with multiple spaces, as in this example, you will have to assign the column names directly.

# Assign the column names manually
read.fwf("myfile.txt", 
         c(7,5,-2,1,1,1,1,1,1), # Width of the columns. -2 means drop those columns
         skip=1,                # Skip the first line (contains header here)
         col.names=c("subject","sex","s1","s2","s3","s4","s5","s6"),
         strip.white=TRUE)      # Strip out leading and trailing whitespace when reading each
# subject sex s1 s2 s3 s4 s5 s6
#    N  1   M  1  1  3  3  1  1
#    NE 2   F  1  1  2  2  3  1
#    S  3   F  1  1  1  2  2  1
#    W  4   M  0  1  1  0  0  2

# If the first row looked like this:
# subject,sex,scores
# Then we could use header=TRUE:
read.fwf("myfile.txt", c(7,5,-2,1,1,1,1,1,1), header=TRUE, strip.white=TRUE)

Excel files

See http://wiki.r-project.org/rwiki/doku.php?id=tips:data-io:ms_windows.